Method and system relating to sentiment analysis of electronic content

ABSTRACT

Users receive information which must be filtered, processed, analysed, reviewed, consolidated and distributed or acted upon. Prior art tools automatically processing content to assign sentiment to the content are ineffective as essential aspects such as context are not considered. Embodiments of the invention provide automatic contextual based sentiment classification of content in terms of both sentiments expressed and their intensity. Further a content set is analysed to rapidly establish an “at-a-glance” type assessment of the key topics/themes present within the content set and sentimentally annotate each. Importantly embodiments of the invention also provide for a user to establish the basis for the sentiment associated with an item of or set of content, i.e. make it explainable. Further embodiments of the invention provide for the establishment of psychological tone to sentiments where the sentiments and psychological tones to be tuned from the context or domain of the content.

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claims the benefit of U.S. Provisional PatentApplication 61/647,183 filed May 15, 2012 entitled “Method and System ofManaging Content” the entire contents of which are incorporated byreference.

FIELD OF THE INVENTION

The present invention relates to published content and more specificallyto the processing of published content for users to associate sentimentto the content.

BACKGROUND OF THE INVENTION

In 2008, Americans consumed information for approximately 1.3 trillionhours, or an average of almost 12 hours per day per person (GlobalInformation Industry Center, University of California at San Diego,January 2010). Consumption totaled 3.6 zettabytes (3.6×10²¹ bytes) and10,845 trillion (10,845×10¹²) words, corresponding to 100,500 words and34 gigabytes for an average person on an average day. This informationcoming from over twenty different sources of information, fromnewspapers and books through to online media, social media, satelliteradio, and Internet video although the traditional media of radio and TVstill dominated consumption per day.

Computers and the Internet have had major effects on some aspects ofinformation consumption. In the past, information consumption wasoverwhelmingly passive, with telephone being the only interactivemedium. However, with computers, a full third of words and more thanhalf of digital data are now received interactively. Reading, which wasin decline due to the growth of television, tripled from 1980 to 2008,because it is the overwhelmingly preferred way to receive words on theInternet. At the same time portable electronic devices and the Internethave resulted in a large portion of the population in the United Statesfor example becoming active generators of information throughout theirdaily lives as well as active consumers augmenting their passiveconsumption. Social media such as Facebook™ and Twitter™, blogs, websitecomment sections, Bing™ Yahoo™ have all contributed in different ways tothe active generation of information by individuals which augments thatgenerated by enterprises, news organizations, Government, and marketingorganizations.

Globally the roughly 27 million computer servers active in 2008processed 9.57 zettabytes of information (Global Information IndustryCenter, University of California at San Diego, April 2011). This studyalso estimated that enterprise server workloads are doubling about everytwo years and whilst a substantial portion of this information isincredibly transient overall the amount of information created, used,and retained is growing steadily.

The exploding growth in stored collections of numbers, images and otherdata represents one facet of information management for organizations,enterprises, Governments and individuals. However, even what was onceconsidered “mere data” becomes more important when it is activelyprocessed by servers as representing meaningful information deliveredfor an ever-increasing number of uses. Overall the 27 million computerservers were estimated as providing an average of 3 terabytes ofinformation per year to each of the estimated 3.18 billion workers inthe world's labor force.

Increasingly, a corporation's competitiveness hinges on its ability toemploy innovative search techniques that help users discover data andobtain useful results. In some instances automatically offeringrecommendations for subsequent searches or extracting relatedinformation are beneficial. To gain some insight into the magnitude ofthe problem consider the following:

-   -   in 2009 around 3.7 million new domains were registered each        month and as of June 2011 this had increased to approximately        4.5 million per month;    -   approximately 45% of Internet users are under 25;    -   there are approximately 600 million wired and 1,200 million        wireless broadband subscriptions globally;    -   approximately 85% of wireless handsets shipped globally in 2011        included a web browser;    -   there are approximately 2.1 billion Internet users globally with        approximately 2.4 billion social networking accounts;    -   there are approximately 800 million users on Facebook™ and        approximately 225 million Twitter™ accounts;    -   there are approximately 250 million tweets per day and        approximately 250 million Facebook activities;    -   there are approximately 3 billion Google™ searches and 300        million Yahoo™ searches per day.

Accordingly it would be evident that users face an overwhelming barrageof information (content) that must be filtered, processed, analysed,reviewed, consolidated and distributed or acted upon. For example amarket researcher seeking to determine the perception of a particularproduct may wish to rapidly collate sentiments from reviews sourced fromwebsites, press articles, and social media.

Similarly, a search by a user using the terms “Barack Obama Afghanistan”with Google™ run on May 2, 2012 returns approximately 324 million “hits”in a fraction of a second. These are displayed, by default in theabsence of other filters by the user, in an order determined by rulesexecuted by Google™ servers relating to factors including, but notlimited to, match to user entered keywords and the number of times aparticular webpage or item of content has been opened. However, withinthis search the same content may be reproduced multiple times indifferent sources legitimately as well as having been plagiarizedpartially into other sources as well as the same event being presentedthrough different content on other websites. Accordingly, differentoccurrences of Barack Obama visiting Afghanistan or different aspects ofhis visit to Afghanistan may become buried in an overwhelming reportingof his last visit or the repeated occurrence of strategic photoopportunities during the visit during a campaign.

Accordingly, it would be beneficial for the user to be able to retrievea collection of multiple items of content, commonly referred to asdocuments, which mention one or more concepts or interests, andautomatically cluster them into cohesive groups that relate to the sameconcepts or interests. Each cohesive group (or cluster) formed therebyconsists of one or more documents from the original collection whichdescribe the same concept or interest even where the documents haveperhaps a different vocabulary. Even when a user identifies an item ofcontent of interest, for example a review of a product, then the salienttext may be buried within a large amount of other content oralternatively the item of content may be formatted for display uponlaptops, tablet PCs, etc. whereas the user is accessing the content on aportable electronic device such as a smartphone or portable gamingconsole for example.

Accordingly it would be beneficial for the user to be able to access thesalient text contained in one or more items of content, based on learnedsemantic and content structure cues so that extraneous elements of theitem of content are removed. Accordingly it would be beneficial toprovide a tool for inducing content scraping automatically to filtercontent to that necessary or automatically extracting core text forviewing on constrained screen devices or vocalizing through a screenreader. Automated summarization or text simplification may also formextensions of the scraper.

Other aspects and features of the present invention will become apparentto those ordinarily skilled in the art upon review of the followingdescription of specific embodiments of the invention in conjunction withthe accompanying figures.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide improvements in theart relating to published content and more specifically to theprocessing of published content for users to associate sentiment tocontent, cluster content for review, and extract core text.

In accordance with an embodiment of the invention there is provided amethod comprising:

-   receiving an item of content;-   parsing the item of content with a microprocessor to generate a    linguistic annotated item of content with language associations;-   retrieving from a term selection rules repository stored upon a    memory at least a rule of a plurality of rules;-   applying with the microprocessor the at least a rule of the    plurality of rules to establish a set of candidate sentiment    carrying terms within the linguistic annotated item of content;-   querying the set of candidate sentiment carrying terms against a    target-domain sentiment lexicon to generate a set of sentiment    labeled terms; and    -   applying to the linguistic annotated item of content a set of        sentiment labeling rules established in dependence of at least        the set of sentiment labeled terms to generate a sentiment label        for the item of content.

In accordance with an embodiment of the invention there is provided amethod comprising:

-   a) receiving an item of content;-   b) receiving upon a microprocessor an indication of a predetermined    portion of the item of content to analyze;-   c) establishing with the microprocessor a plurality of positive    sentiment terms and a plurality of negative sentiment terms;-   d) parsing with the microprocessor the predetermined portion of the    item of content to count occurrences of a positive sentiment term of    the plurality of positive sentiment terms to establish a positive    sentiment count;-   e) parsing with the microprocessor the predetermined portion of the    item of content to count occurrences of a negative sentiment term of    the plurality of negative sentiment terms to establish a negative    sentiment count; and-   f) determining with the microprocessor a sentiment label to    associate with the item of content in dependence upon at least one    of the occurrences of the positive sentiment term and occurrences of    the negative sentiment term.

In accordance with an embodiment of the invention there is provided amethod comprising:

receiving with an item of content;

-   processing with a microprocessor the item of content to determine    occurrences of content sentiment-carrying terms;-   displaying to a user the sentiment labels of content    sentiment-carrying terms within the item of content; and-   presenting to the user any sentiment intensity variation based on    matching at least one of a predetermined sentence and a phrasal    syntactic structure of the document with a repository of syntactic    structure patterns.

In accordance with an embodiment of the invention there is provided amethod comprising:

-   a) receiving a plurality of items of content;-   b) identifying with a microprocessor within the plurality of items    of content at least a core multi-item concept of a plurality of core    multi-item concepts, each core multi-item concept relating to a    concept contained at least within a predetermined portion of the    plurality of items of concept;-   c) selecting a core multi-item concept from the plurality of core    multi-item concepts; and-   d) establishing with the microprocessor a sentiment relating to the    core multi-item concept for the plurality of items of content.

Other aspects and features of the present invention will become apparentto those ordinarily skilled in the art upon review of the followingdescription of specific embodiments of the invention in conjunction withthe accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will now be described, by way ofexample only, with reference to the attached Figures, wherein:

FIG. 1A depicts a network accessible by a user and content sourcesaccessible to the user with respect to embodiments of the invention;

FIG. 1B depicts an electronic device supporting communications andinteractions for a user according to embodiments of the invention

FIGS. 2A and 2B depict a machine based sentiment learning andclassification process according to the prior art;

FIG. 3 depicts a flowchart of a process for a sentiment classificationprocess using a target-domain sentiment lexicon according to anembodiment of the invention;

FIG. 4 depicts a flowchart of a process for a target domain sentimentlexicon generation process according to an embodiment of the invention;and

FIG. 5 depicts a process flow for associating key concepts withinmultiple documents and associating sentiments to the key conceptsaccording to an embodiment of the invention.

DETAILED DESCRIPTION

The present invention is directed to published content and morespecifically to the processing of published content for users toassociate sentiment to content, cluster content for review, and extractcore text.

The ensuing description provides exemplary embodiment(s) only, and isnot intended to limit the scope, applicability or configuration of thedisclosure. Rather, the ensuing description of the exemplaryembodiment(s) will provide those skilled in the art with an enablingdescription for implementing an exemplary embodiment. It beingunderstood that various changes may be made in the function andarrangement of elements without departing from the spirit and scope asset forth in the appended claims.

A “portable electronic device” (PED) as used herein and throughout thisdisclosure, refers to a wireless device used for electroniccommunications that requires a battery or other independent form ofenergy for power. This includes devices, but is not limited to, such asa cellular telephone, smartphone, personal digital assistant (PDA),portable computer, pager, portable multimedia player, portable gamingconsole, laptop computer, tablet computer, and an electronic reader. A“fixed electronic device” (FED) as used herein and throughout thisdisclosure, refers to a wired or wireless device used for electroniccommunications that may be dependent upon a fixed source of power,employ a battery or other independent form of energy for power. Thisincludes devices, but is not limited to, such as a portable computer,personal computer, Internet enabled display, gaming console, computerserver, kiosk, and a terminal.

A “network operator/service provider” as used herein may refer to, butis not limited to, a telephone or other company that provides servicesfor mobile phone subscribers including voice, text, and Internet;telephone or other company that provides services for subscribersincluding but not limited to voice, text, Voice-over-IP, and Internet; atelephone, cable or other company that provides wireless access to localarea, metropolitan area, and long-haul networks for data, text,Internet, and other traffic or communication sessions; etc.

“Content”, “input content” and/or “document” as used herein and throughthis disclosure refers to an item or items of information storedelectronically and accessible to a user for retrieval or viewing. Thisincludes, but is not limited to, documents, images, spreadsheets,databases, audiovisual data, multimedia data, encrypted data, SMSmessages, social media data, data formatted according to a markuplanguage, and information formatted according to a portable documentformat.

A “web browser” as used herein and through this disclosure refers to asoftware application for retrieving, presenting, and traversinginformation resources on the World Wide Web identified by a UniformResource Identifier (URI) and may be a web page, image, video, or otherpiece of content. The web browser also allows a user to access andimplement hyperlinks present in accessed resources to navigate theirbrowsers to related resources. A web browser may also be defined withinthe scope of this specification as an application software or programdesigned to enable users to access, retrieve and view documents andother resources on the Internet as well as access information providedby web servers in private networks or files in file systems.

An “application” as used herein and through this disclosure refers to asoftware application, also known as an “app”, which is computer softwaredesigned to help the user to perform specific tasks. This includes, butis not limited to, web browser, enterprise software, accountingsoftware, information work software, content access software, educationsoftware, media development software, office suites, presentationsoftware, work processing software, spreadsheets, graphics software,email and blog client software, personal information systems and desktoppublishing software. Many application programs deal principally withmultimedia, documentation, and/or audiovisual content in conjunctionwith a markup language for annotating a document in a way that issyntactically distinguishable from the content. Applications may bebundled with the computer and its system software, or may be publishedseparately.

A “user,” as used herein and through this disclosure refers to, but isnot limited to, a person or device that generates, receives, analyses,or otherwise accesses content stored electronically within a portableelectronic device, fixed electronic device, network accessible server,or other source storing content.

A “server” as used herein and through this disclosure refers to acomputer program running to serve the requests of other programs, the“clients”. Thus, the “server” performs some computational task on behalfof “clients” which may either run on the same computer or connectthrough a network. Accordingly such “clients” therefore beingapplications in execution by one or more users on their PED/FED orremotely at a server. Such a server may be one or more physicalcomputers dedicated to running one or more services as a host. Examplesof a server include, but are not limited to, database server, fileserver, mail server, print server, and web server.

Referring to FIG. 1A there is depicted a network supportingcommunications and interactions between devices connected to the networkand executing functionalities according to embodiments of the inventionwith a first and second user groups 100A and 1000B respectively to atelecommunications network 100. Within the representativetelecommunication architecture a remote central exchange 180communicates with the remainder of a telecommunication service providersnetwork via the network 100 which may include for example long-haulOC-48/OC-192 backbone elements, an OC-48 wide area network (WAN), aPassive Optical Network, and a Wireless Link. The remote centralexchange 180 is connected via the network 100 to local, regional, andinternational exchanges (not shown for clarity) and therein throughnetwork 100 to first and second wireless access points (AP) 120 and 110respectively which provide Wi-Fi cells for first and second user groups100A and 100B respectively.

Within the cell associated with first AP 120 the first group of users100A may employ a variety of portable electronic devices (PEDs)including for example, laptop computer 155, portable gaming console 135,tablet computer 140, smartphone 150, cellular telephone 145 as well asportable multimedia player 130. Within the cell associated with secondAP 110 the second group of users 100B may employ a variety of portableelectronic devices (not shown for clarity) but may also employ a varietyof fixed electronic devices (FEDs) including for example gaming console125, personal computer 115 and wireless/Internet enabled television 120as well as cable modem 105 which links second AP 110 to the network 100.

Also connected to the network 100 is cell tower 125 that provides, forexample, cellular GSM (Global System for Mobile Communications)telephony services as well as 3G and 4G evolved services with enhanceddata transport support. Cell tower 125 provides coverage in theexemplary embodiment to first and second user groups 100A and 100B.Alternatively the first and second user groups 100A and 100B may begeographically disparate and access the network 100 through multiplecell towers, not shown for clarity, distributed geographically by thenetwork operator or operators. Accordingly, the first and second usergroups 100A and 100B may according to their particular communicationsinterfaces communicate to the network 100 through one or morecommunications standards such as, for example, IEEE 802.11, IEEE 802.15,IEEE 802.16, IEEE 802.20, UMTS, GSM 850, GSM 900, GSM 1800, GSM 1900,GPRS, ITU-R 5.138, ITU-R 5.150, ITU-R 5.280, and IMT-2000. It would beevident to one skilled in the art that many portable and fixedelectronic devices may support multiple wireless protocolssimultaneously, such that for example a user may employ GSM servicessuch as telephony and SMS and Wi-Fi/WiMAX data transmission, VOW andInternet access.

Also communicated to the network 100 are first and second servers 110Aand 110B respectively which host according to embodiments of theinvention multiple services associated with content from one or moresources including for example, but not limited to:

-   -   social media 160 such as Facebook™, Twitter™, Linkedln™ etc;    -   web feeds 165 such as formatted according to RSS and/or Atom        formats to publish frequently updated works;    -   web portals 170 such as Yahoo™, Google™, Baidu™, and Microsoft's        Bing™ for example;    -   broadcasters 175 including Fox, NBC, CBS, and Comcast for        example who provide content via multiple media including for        example satellite, cable, and Internet;    -   print media 180 including for example USA Today, Washington        Post, Ls Angeles Times and China Daily;    -   websites 185 including, but not limited to, manufacturers,        market research, consumer research, newspapers, journals, and        financial institutions.

Also connected to network 100 is application server 105 which providessoftware system(s) and software application(s) associated with receivingretrieved content and processing said published content for users toassociate sentiment to content, cluster content for review, and extractcore text as discussed below in respect of embodiments of the invention.First and second servers 110A and 110B and application server 105together with other servers not shown for clarity may also provideddictionaries, speech recognition software, product databases, inventorymanagement databases, retail pricing databases, shipping databases,customer databases, software applications for download to fixed andportable electronic devices, as well as Internet services such as asearch engine, financial services, third party applications,directories, mail, mapping, social media, news, user groups, and otherInternet based services.

Referring to FIG. 1B there is depicted an electronic device 1004,supporting communications and interactions according to embodiments ofthe invention with local and/or remote services. Electronic device 1004may be for example a PED, FED, a terminal, or a kiosk. Also depictedwithin the electronic device 1004 is the protocol architecture as partof a simplified functional diagram of a system 1000 that includes anelectronic device 1004, such as a smartphone 155, an access point (AP)1006, such as first Wi-Fi AP 110, and one or more remote servers 1007,such as communication servers, streaming media servers, and routers forexample such as first and second servers 110A and 110B respectively.Remote server cluster 1007 may be coupled to AP 1006 via any combinationof networks, wired, wireless and/or optical communication links such asdiscussed above in respect of FIG. 1. The electronic device 1004includes one or more processors 1010 and a memory 1012 coupled toprocessor(s) 1010. AP 1006 also includes one or more processors 1011 anda memory 1013 coupled to processor(s) 1011. A non-exhaustive list ofexamples for any of processors 1010 and 1011 includes a centralprocessing unit (CPU), a digital signal processor (DSP), a reducedinstruction set computer (RISC), a complex instruction set computer(CISC) and the like. Furthermore, any of processors 1010 and 1011 may bepart of application specific integrated circuits (ASICs) or may be apart of application specific standard products (ASSPs). A non-exhaustivelist of examples for memories 1012 and 1013 includes any combination ofthe following semiconductor devices such as registers, latches, ROM,EEPROM, flash memory devices, non-volatile random access memory devices(NVRAM), SDRAM, DRAM, double data rate (DDR) memory devices, SRAM,universal serial bus (USB) removable memory, and the like.

Electronic device 1004 may include an audio input element 1014, forexample a microphone, and an audio output element 1016, for example, aspeaker, coupled to any of processors 1010. Electronic device 1004 mayinclude a video input element 1018, for example, a video camera, and avideo output element 1020, for example an LCD display, coupled to any ofprocessors 1010. Electronic device 1004 includes one or moreapplications 1022 that are typically stored in memory 1012 and areexecutable by any combination of processors 1010. Electronic device 1004includes a protocol stack 1024 and AP 1006 includes a communicationstack 1025. Within system 1000 protocol stack 1024 is shown as IEEE802.11 protocol stack but alternatively may exploit other protocolstacks such as an Internet Engineering Task Force (IETF) multimediaprotocol stack for example. Likewise AP stack 1025 exploits a protocolstack but is not expanded for clarity. Elements of protocol stack 1024and AP stack 1025 may be implemented in any combination of software,firmware and/or hardware. Protocol stack 1024 includes an IEEE802.11-compatible PHY module 1026 that is coupled to one or moreFront-End Tx/Rx & Antenna 1028, an IEEE 802.11-compatible MAC module1030 coupled to an IEEE 802.2-compatible LLC module 1032. Protocol stack1024 includes a network layer IP module 1034, a transport layer UserDatagram Protocol (UDP) module 1036 and a transport layer TransmissionControl Protocol (TCP) module 1038.

Protocol stack 1024 also includes a session layer Real Time TransportProtocol (RTP) module 1040, a Session Announcement Protocol (SAP) module1042, a Session Initiation Protocol (SIP) module 1044 and a Real TimeStreaming Protocol (RTSP) module 1046. Protocol stack 1024 includes apresentation layer media negotiation module 1048, a call control module1050, one or more audio codecs 1052 and one or more video codecs 1054.Applications 1022 may be able to create maintain and/or terminatecommunication sessions with any of remote servers 1007 by way of AP1006. Typically, applications 1022 may activate any of the SAP, SIP,RTSP, media negotiation and call control modules for that purpose.Typically, information may propagate from the SAP, SIP, RTSP, medianegotiation and call control modules to PHY module 1026 through TCPmodule 1038, IP module 1034, LLC module 1032 and MAC module 1030.

It would be apparent to one skilled in the art that elements of the PED1004 may also be implemented within the AP 1006 including but notlimited to one or more elements of the protocol stack 1024, includingfor example an IEEE 802.11-compatible PHY module, an IEEE802.11-compatible MAC module, and an IEEE 802.2-compatible LLC module1032. The AP 1006 may additionally include a network layer IP module, atransport layer User Datagram Protocol (UDP) module and a transportlayer Transmission Control Protocol (TCP) module as well as a sessionlayer Real Time Transport Protocol (RTP) module, a Session AnnouncementProtocol (SAP) module, a Session Initiation Protocol (SIP) module and aReal Time Streaming Protocol (RTSP) module, media negotiation module,and a call control module.

As depicted remote server cluster 1007 comprises a firewall 1007Athrough which the discrete servers within the remote server cluster 1007are accessed. Alternatively remote server 1007 may be implemented asmultiple discrete independent servers each supporting a predeterminedportion of the functionality of remote server cluster 1007. As presentedthe discrete servers include application servers 1007B dedicated torunning certain software applications, communications server 1007Cproviding a platform for communications networks, database server 1007Dproviding database services to other computer programs or computers, webserver 1007E providing HTTP clients connectivity in order to sendcommands and receive responses along with content, and proxy server1007F that acts as an intermediary for requests from clients seekingresources from other servers.

Contextual Sentiment Classification:

Prior Art:

Within the prior art multiple approaches to classifying or assigning asentiment for an item of content, typically a document or portion of adocument, exist. However, these existing sentiment filtering approachessimply determine occurrences of a keyword with positive and negativeterms to establish an overall sentiment. However, this analysis does notprovide any context in respect of these occurrences with their context.As outlined above the phrase “Last night I drove to see Terminator 3 inmy new Fiat 500, after eating at Stonewall, the truffle bison burger wasgreat” would be interpreted as positive feedback even though thepositive term is associated with the food rather than either the film“Terminator 3” or the vehicle “Fiat 500.” Accordingly, it would bebeneficial for sentiment analysis of content to be contextually aware.

Referring to FIGS. 2A and 2B there are depicted first and secondschematic representations 200 and 2000 respectively of the prior art ofPang et al for sentiment classification, which employs the classic‘bag-of-words’ feature representation for machine learningclassification. Referring to first schematic 200 there is depicted afirst stage of the prior art process wherein a learning process isperformed. A training document set 205 is stored upon a server forexample wherein the training document set 205 comprises a predeterminedset of documents that serve as training examples for the prior artprocess wherein typically half of the training document set 205 arelabelled as expressing positive sentiment, and the other half of thetraining document set 205 are labelled as expressing negative sentiment.The training document set 205 are then parsed in a feature vocabularyextraction process 210 to provide a unique set of words found in thetraining document set 205. Optionally these are stored with associatedfrequency counts. The “feature vocabulary list” extracted in featurevocabulary extraction process 210 is then optionally reduced throughfeature engineering 220 to a smaller set via thresholds which may forexample be based on word frequencies, chi-squared distribution (alsoknown as chi-square or χ² distribution), or information theoretic meansfor example. New features may also be introduced via documents or corpusanalysis. The training document set 205 are then processed using astandard machine learning algorithm 230, such as for example NaïveBayes, Support Vector Machines, and Maximum Entropy to generate aclassification model 235 based on the association of provided featuresto the document sentiment labels.

Now referring to second schematic 2000 a second stage of the prior artis depicted wherein an input document 240 is to be analyzed forsentiment. A feature vocabulary 245 was used to generate a sentimentclassification model 255 as discussed above in respect of firstschematic 200 during a machine learning training process 230.Accordingly the input document 240 is processed by an initial documentfeature engineering 250 process which converts the input document 240 toa format that matches the features employed in the sentimentclassification process 260 which is based upon a machine learning model255. This transformation follows the same process as feature engineering220 in first schematic 200 of FIG. 2A. Accordingly the sentimentclassification process 260 assigns a sentiment label to the featuresderived from the input document 240 wherein the positive or negativesentiment is output as document sentiment label 270 and associated withthe input document 240.

Such prior art approaches suffer from a number of serious limitations,which are addressed by embodiments of the current invention. Thelimitations include the fact that the sentiment label 270 applied to aninput document 240 is not explainable. Most machine-learning basedclassification systems generate an opaque high-dimensional model suchthat the sentiment label associated with a document cannot be mappedback to the document, and thus there is no easily understandable methodto describe how the class-association statistics associated withindividual features are used to derive the sentiment label. This“black-box” nature of the machine learning classifier can unnerve thosewho depend professionally on the veracity of the sentiment label to makebusiness decisions.

Additionally the performance of these supervised machine learningtechniques is dependent on the degree to which the training data set andtesting data match with respect to domain, topic and time-period.However, it would be evident that a term may provide positive ornegative sentiment and accordingly should not form part of the featurevocabulary. For example the word “conservative” may be considered tohave positive sentiment in content from the financial domain, but mayhave negative sentiment in content relating to movie reviews or anartistic genre. Accordingly prior art machine learning based solutionsdo not ensure that the sentiment associated with a document'sconstituent terms is derived from the same sentiment context as thedocument. Without this domain match, highly descriptive words in testingor production document may have a different sentiment than those givenin the training document set. Prior art techniques are also not arrivedat by a rigorous linguistic analysis of the document.

It would also be evident that the prior art machine learningclassification approaches can only operate on information that they haveencountered before, i.e. only those features are supported that wereincluded in the training document set's vocabulary. Occurrences of“unseen” words, i.e. words not within the training document set whichare extracted into the feature vocabulary set, are essentially ignored.Another limitation within prior art techniques is the ability toclassify small documents, especially data sets derived from cellular SMSmessages or Twitter status updates for example, as these documents aretoo small to accurately be classified by machine learning basedsentiment classifiers. However, in many instances such documents aredesirable as the focus of sentiment classification as a substantialnegative or positive sentiment across SMS messages, Tweets, or Facebookstatus updates provide rapid near real-time analysis of an event oroccurrence. For example, a broadcaster upon broadcasting a potentiallycontroversial episode or program may gauge their viewers' responses asthe broadcast progresses and track the subsequent evolution ofdemographic breakdowns in sentiment or evolution of consensus forexample.

Contextual Sentiment Classification—Sentiment Classification Process:

The contextual sentiment classification of content according toembodiments of the invention is achieved through use of two coreprocesses. These are a sentiment classification process which exploits atarget-domain sentiment lexicon and generation of the target-domainsentiment lexicon. Referring to FIG. 3 there is presented an overviewprocess flowchart 300 according to an embodiment of the invention bywhich an input document 310 is labelled with a sentiment label 370 as anoutput of the overview process flowchart 300 class, with optionalsentiment intensity, via a linguistic parser 320, term selection rules340, target-domain sentiment lexicon 350, and document sentimentlabelling rules 380. The sentiment label 370 being generated independence of one or more sentiment labelled terms 360 generated throughthe process.

Accordingly the process begins with input content, document 310, whichis transformed via a parser 320 into an annotated form with associationsincluding, but not limited to, part-of-speech, phrasal chunks, andgrammatical relations associated with terms that constitute the inputcontent, document 310. Rules retrieved from a term selection rulesrepository 340 are then employed to derive a set of candidate sentimentcarrying terms, selected terms 330, from the annotated version of thedocument 310 generated by parser 320. Each selected term 330 is thenqueried in a target-domain sentiment lexicon 350 to create a list ofterms, the sentiment labelled terms 360, with associated sentimentlabels and optionally associated sentiment intensity. These sentimentlabelled terms 360 with any associated elements are then employed withthe linguistic annotated version of the document generated by the parser320 to apply a set of document sentiment labeling rules 380 in order togenerate a document sentiment label 370. Similarly optionally associatedsentiment intensities can be employed in conjunction with the documentsentiment labeling rules 380 to establish an optional sentimentintensity level for the document 310.

Optionally, the sentiment labelled terms 360, have associated with themone or more sentiment labels and optionally one or more associatedsentiment intensities. For example, the term “git” may have thesentiment label of “hate” associated with an intensity of “weak” whereas“loathe” may have the same sentiment label of “hate” but an intensity of“extreme.” It would be evident to one skilled in the art that thetarget-domain sentiment lexicon 350 may established in dependence uponthe domain of the input content, document 310. The domain may be one ormore fields, the fields including but not limited to, an area of humanactivity, an area of human interest, an area of human endeavour, atopic, a subject, an area of academic interest, an area of academicspecialization, a profession, an aspect of business, an aspect ofentertainment, and an aspect of personal relationships. The termselection rules repository 340 and the rules stored within it mayoptionally be established upon the domain of the input content oralternatively these may be established in dependence upon one or morefactors including the enterprise/service provider executing thesentiment classification process, the software system and/or softwaresystem provider supplied repository and rules, user preferences, andpreferences of a requestor of a sentiment analysis.

It would be evident to one skilled in the art that the process describedabove in respect of FIG. 3 may be applied to a plurality of documents toform the input content wherein the results of each of the plurality ofdocuments may be reported individually or the results may be collated toprovide a single determined sentiment or an analysis such as numbersexpressing strong positive, positive, mildly positive, neutral, mildlynegative, negative, and strong negative sentiment. Such analysis mayinclude optionally reporting events of particular sentiments withintense or very strong sentiment. Optionally, the results of a sentimentanalysis such as described supra may be employed in other processes,such as, for example, where the sentiment labelled terms become elementsof core text to be extracted from a document through a salient contentextraction process such that the result of such a process is a documentor documents being reduced to the text associated with the sentimentlabelled terms.

Contextual Sentiment Classification—Target-Domain Sentiment LexiconGeneration Process:

As noted supra the sentiment classification process exploits atarget-domain sentiment lexicon and accordingly the generation of thetarget-domain sentiment lexicon, which is a separate process isdescribed here. Referring to FIG. 4 there is illustrated a processflowchart schematic 400 wherein an input term 410 is assigned atarget-domain sentiment label with a sentiment lexicon 480, with anoptional sentiment intensity, by analyzing the co-occurrence counts ofthis input term 410 with negative sentiment seed terms 420 and positivesentiment seed terms 430 in a target-domain document set 440.

The process flowchart schematic 400 depicting the lexicon generationprocess is based upon a determination process. This process is basedupon generating two counts, the first count being of documents in thetarget-domain document set 440 containing both an input term 410 and oneor more negative sentiment seed terms of the set of negative sentimentseed terms 420 and storing this negative sentiment seed co-occurrencecount 450. The second count being of documents in the target-domaindocument set 440 containing both an input term 410 and one or morepositive sentiment seed terms of the set of positive sentiment seedterms 430 and is stored as the positive sentiment seed co-occurrencecount 460. Optionally, the co-occurrence counts, being negativesentiment seed co-occurrence count 450 and positive sentiment seedco-occurrence count 460, may count co-occurrences in one or more ofparagraphs, sentences, sliding windows of word (optionally truncated bysentence end punctuations), and via grammatical relations.

The counts of negative and positive seed term co-occurrence counts 450and 460 respectively are analyzed to determine the target-domainsentiment label of the term, the sentiment label of term 470.Subsequently the input term, sentiment label, and (optionally) countinformation, is reported to a user as shown in the process by ReportSentiment 475 and is also stored into a target-domain sentiment lexicon480. The analysis and determination of the sentiment label of term 470may for example simply be the higher score if the negative term counts,negative sentiment seed co-occurrence count 450, are approximately equalthe positive term counts, positive sentiment seed co-occurrence count460. Alternatively, if the classes are imbalanced the analysis mayinvolve a normalization step to reduce the weighting of the morefrequent class or terms within each of the negative and positive seedterm co-occurrence counts 450 and 460 respectively may have weightingsassociated with them such that certain terms if occurring in a documenthave higher weighting than others.

It would be evident that input term 410 may be an item of contentwithout any prior consideration or analysis and hence may be an item ofcontent retrieved from one or more sources as discussed above in respectof FIG. 1 or may be an item of content received in real time such thatfor example Twitter tweets or Facebook posts may be analysed as they arepublished thereby allowing an organization the ability to monitorsentiments in essentially real-time. It would also be evident that theitem of content may be a single document, such as for example amarketing report or a customer comment received online; a collection ofdocuments; a webpage such as for example a blog, a reporters column, acompetitor's product, or a consumer organization's report; or a webdomain such that all content within the web domain is analysed such asfor example web domains for consumer organizations, newspapers,magazines, competitors, and retailers. It would be further evident thatinput term 410 may be initially filtered for an occurrence of aparticular keyword, subset of a set of keywords, or all keywords in aset of keywords. Optionally the content may also be processed such thatlocations of the negative and positive sentiment seed terms relative toone or more keywords are determined and only those meeting apredetermined threshold condition are counted into the respectivenegative and positive sentiment seed co-occurrence counts.

The content in addition to a social network status update may thereforeas discussed and presented supra include, but not be limited to, othercontent such as an email, a news article, a blog post, a forum comment,a stock report, a news cast, a web page, or any other form of usergenerated content and/or content generated from an editorial process.The document may have a structure, such as for example including atitle, body, and summary, with one or more paragraphs. The structurecould be in the form of a template or a frame. Accordingly sentimentanalysis may be performed on these structural elements independently toprovide multiple sentiments for the item of content or be combined witha weighting in dependence of the structure to provide a sentiment forthe content overall. For example, sentiments within the title andsummary may be weighted higher than those within the body of thecontent.

Optionally, according to another embodiment of the invention adomain-detection component may be provided which identifies the domainof an input document, and employs this domain-identification-tag tochoose one (or more) target-domain sentiment lexicons from a pluralityof stored lexicons. According to another embodiment of the invention asentiment may be provided with an ordinal scale, for example from {0,1},{−1,+1}, {−2,+5}, or {−5,+5}.

In another embodiment of the invention in addition to the sentimentlabel for the document, a set of sentiment labels, with optionalintensity metrics, could be provided for each constituent term in thedocument. Optionally the sentiment returned for the document could alsocontain psychological tone qualifications, such as anger, affinity,disgust, sorrow, etc. based upon exploiting known emotion and attitudeontologies.

The invention could also be combined with a display method which canshow the document and the associated sentiment, with optionalannotations on selected lexical units that serve to explain thesentiment provided thereby.

Accordingly, advantages of embodiments of the invention include:

-   -   providing improved sentiment analysis as the sentiment generated        is based on a targeted-domain sentiment lexicon;    -   domain-independent sentiment analysis can be provided when a        contextual sentiment analysis system is coupled with a large        sample of documents that pertain to a plurality of subjects of        interest to a variety of readers;    -   ability to describe why a sentiment label has been applied to a        document by providing the underlying sentiment(s) associated        with selected terms in the document;    -   a parser is employed to select the salient terms from the        document thereby allowing the system to assign sentiment to only        the relevant sentiment-carrying terms.

It would be evident that beneficially the parser allows foridentification of the syntactic and semantic linguistic roles of theterms that constitute the document being analyzed for sentiment. Furtherby employing a set of document sentiment labeling rules, that operate onthe syntactic, semantic and sentiment meta-data associated with theterms constituting a document, embodiments of the invention can generatea sentiment based on the linguistic structure of the document, ratherthan employing the prior art linguistic-structure-bereft ‘bag-of-words’machine learning sentiment analysis framework.

Contextual Sentiment Classification—Multi-Document Key ConceptGeneration and Sentiment Association Process:

Referring to FIG. 5 there is depicted a process flowchart 500 accordingto an embodiment of the invention for associating key concepts withinmultiple documents and associating sentiments to the key concepts. Asdepicted process flowchart 500 begins at step 505 wherein the documentset is selected by one or more methods including, but not limited to,manual selection by the user, automatically by an application inexecution associated with the user, automatically by an application inexecution upon a software system associated with a service subscribed toby the user, and an application in execution upon a software systemassociated with a software application employed by the user. The processthen proceeds to step 510 wherein the core multi-document concepts areidentified. These core multi-document concepts being identified, forexample, using a ranking technique including, but not limited to,frequency-based ranking, chi-square, mutual information, k-meansclustering, vector-space centroids. The process then proceeds to step515 wherein the list of key concepts may be filtered to reduce thederived, optionally ranked list, via one or more techniques including,but not limited to, threshold based cutoff, top predetermined number,confidence scores or by comparing with a stop-word list which consistsof terms to be excluded as key concepts.

In step 520 the core multi-document concept is selected, e.g. highestranking, wherein the process proceeds to step 525 for a determination asto the method to be employed is made, which are shown as “DocumentSummary” and “All Occurrences”. If “Document Summary” is selected, forexample by the user, via a preference within the software applicationand/or software system, number of documents, and in dependence upon thecore multi-document concept, then the process proceeds to step 530wherein a document based sentiment for the given key concept is obtainedfor a document within the document set. In step 535 the processdetermines whether all documents within the document set have haddocument based sentiments established wherein the process loops back tostep 530 when further documents remain or proceeds to step 540 whereincounts are generated for the positive, negative and neutral sentimentsestablishing how many documents for that sentiment it is the overall.Then in step 545 the user is presented with the category with thelargest sentiment count, or alternatively is presented with the resultsfor all three categories. The largest sentiment count category may thenbe employed according to embodiments of the invention for a variety ofsubsequent processes, such as for example rewarding customers withinthat category for their feedback which may be in some instances negativefeedback but avoiding automatic rewarding for good feedback may resultin a more honest feedback. Alternatively, the sentiment result may beemployed to trigger other activities or events such as searching forthat sentiment within a new document set.

If in step 525 the “All Occurrences” method was selected then theprocess proceeds to step 550 wherein the context-count-based sentimentfor a given key concept is established by identifying the sentimentassociated with each and every instance of the key concept as it occursin each document being processed. Accordingly, the process then proceedsto step 545 again to present for example and an indicator that indicatesthe sentiment of the term based on the sentiment label derived using theresults from step 550 via simple addition or through other sentimentclassification techniques. The indication may for example be a colourcoding, audiovisual coding, or another indicator as known within theart.

It would be evident that other statistical techniques and approaches maybe employed in establishing the core multi-document concepts includingidentification by the user, identification by the software applicationsand/or software system using previously stored index terms, and entry ofa search term and/or terms into a software application such as anInternet browser for example. Optionally, the filtering step 515 may beomitted or replaced with a user selection using a graphical userinterface according to one or more techniques known in the prior art. Aspresented steps 525 through 550 of process flowchart 500 are depicted asoccurring once for the top ranked core multi-document concept. However,it would be evident to one skilled in the art that these steps may berepeated for one or more of the core multi-document concepts resultingfrom the filtering step 515. For example, the top 5 concepts may beautomatically processed or all concepts exceeding a threshold may beprocessed.

It would be evident that more or less categories may be established forthe multi-document sentiment analysis of the sentiment set or that theprocess may be re-run once a particular overall sentiment has beenassessed to refine the analysis, for example negative may besubsequently assessed for anger, frustration, calm for example. Withinthe embodiments of the invention a document within a document set mayrefer, for example, to an article, a blog, a social media post, anemail, a comment posted to a website, a word processing document, anoffice document, a response to a survey, an item of multimedia content,and an item of audiovisual content. Optionally, the results from theprocess flowchart 500 relating to a sentiment analysis of a core conceptor core concepts within a document set may be communicated through thesoftware application or another software application, e.g. an electronicmail application, for distribution. According, a user may establish asentiment analysis upon a software system and/or software applicationwhich periodically selects a predetermined number of documents to form adocument set from a larger volume of documents and transmits the resultof sentiment analysis and core concepts to the user such that forexample a news service may not only identify the currently trendingtopics within say, Twitter™, but also automatically obtain associatedwith these the sentiment analysis.

Specific details are given in the above description to provide athorough understanding of the embodiments. However, it is understoodthat the embodiments may be practiced without these specific details.For example, circuits may be shown in block diagrams in order not toobscure the embodiments in unnecessary detail. In other instances,well-known circuits, processes, algorithms, structures, and techniquesmay be shown without unnecessary detail in order to avoid obscuring theembodiments.

Implementation of the techniques, blocks, steps and means describedabove may be done in various ways. For example, these techniques,blocks, steps and means may be implemented in hardware, software, or acombination thereof. For a hardware implementation, the processing unitsmay be implemented within one or more application specific integratedcircuits (ASICs), digital signal processors (DSPs), digital signalprocessing devices (DSPDs), programmable logic devices (PLDs), fieldprogrammable gate arrays (FPGAs), processors, controllers,micro-controllers, microprocessors, other electronic units designed toperform the functions described above and/or a combination thereof.

Also, it is noted that the embodiments may be described as a processwhich is depicted as a flowchart, a flow diagram, a data flow diagram, astructure diagram, or a block diagram. Although a flowchart may describethe operations as a sequential process, many of the operations can beperformed in parallel or concurrently. In addition, the order of theoperations may be rearranged. A process is terminated when itsoperations are completed, but could have additional steps not includedin the figure. A process may correspond to a method, a function, aprocedure, a subroutine, a subprogram, etc. When a process correspondsto a function, its termination corresponds to a return of the functionto the calling function or the main function.

Furthermore, embodiments may be implemented by hardware, software,scripting languages, firmware, middleware, microcode, hardwaredescription languages and/or any combination thereof. When implementedin software, firmware, middleware, scripting language and/or microcode,the program code or code segments to perform the necessary tasks may bestored in a machine readable medium, such as a storage medium. A codesegment or machine-executable instruction may represent a procedure, afunction, a subprogram, a program, a routine, a subroutine, a module, asoftware package, a script, a class, or any combination of instructions,data structures and/or program statements. A code segment may be coupledto another code segment or a hardware circuit by passing and/orreceiving information, data, arguments, parameters and/or memorycontents. Information, arguments, parameters, data, etc. may be passed,forwarded, or transmitted via any suitable means including memorysharing, message passing, token passing, network transmission, etc.

For a firmware and/or software implementation, the methodologies may beimplemented with modules (e.g., procedures, functions, and so on) thatperform the functions described herein. Any machine-readable mediumtangibly embodying instructions may be used in implementing themethodologies described herein. For example, software codes may bestored in a memory. Memory may be implemented within the processor orexternal to the processor and may vary in implementation where thememory is employed in storing software codes for subsequent execution tothat when the memory is employed in executing the software codes. Asused herein the term “memory” refers to any type of long term, shortterm, volatile, nonvolatile, or other storage medium and is not to belimited to any particular type of memory or number of memories, or typeof media upon which memory is stored.

Moreover, as disclosed herein, the term “storage medium” may representone or more devices for storing data, including read only memory (ROM),random access memory (RAM), magnetic RAM, core memory, magnetic diskstorage mediums, optical storage mediums, flash memory devices and/orother machine readable mediums for storing information. The term“machine-readable medium” includes, but is not limited to portable orfixed storage devices, optical storage devices, wireless channels and/orvarious other mediums capable of storing, containing or carryinginstruction(s) and/or data.

The methodologies described herein are, in one or more embodiments,performable by a machine which includes one or more processors thataccept code segments containing instructions. For any of the methodsdescribed herein, when the instructions are executed by the machine, themachine performs the method. Any machine capable of executing a set ofinstructions (sequential or otherwise) that specify actions to be takenby that machine are included. Thus, a typical machine may be exemplifiedby a typical processing system that includes one or more processors.Each processor may include one or more of a CPU, a graphics-processingunit, and a programmable DSP unit. The processing system further mayinclude a memory subsystem including main RAM and/or a static RAM,and/or ROM. A bus subsystem may be included for communicating betweenthe components. If the processing system requires a display, such adisplay may be included, e.g., a liquid crystal display (LCD). If manualdata entry is required, the processing system also includes an inputdevice such as one or more of an alphanumeric input unit such as akeyboard, a pointing control device such as a mouse, and so forth.

The memory includes machine-readable code segments (e.g. software orsoftware code) including instructions for performing, when executed bythe processing system, one of more of the methods described herein. Thesoftware may reside entirely in the memory, or may also reside,completely or at least partially, within the RAM and/or within theprocessor during execution thereof by the computer system. Thus, thememory and the processor also constitute a system comprisingmachine-readable code.

In alternative embodiments, the machine operates as a standalone deviceor may be connected, e.g., networked to other machines, in a networkeddeployment, the machine may operate in the capacity of a server or aclient machine in server-client network environment, or as a peermachine in a peer-to-peer or distributed network environment. Themachine may be, for example, a computer, a server, a cluster of servers,a cluster of computers, a web appliance, a distributed computingenvironment, a cloud computing environment, or any machine capable ofexecuting a set of instructions (sequential or otherwise) that specifyactions to be taken by that machine. The term “machine” may also betaken to include any collection of machines that individually or jointlyexecute a set (or multiple sets) of instructions to perform any one ormore of the methodologies discussed herein.

The foregoing disclosure of the exemplary embodiments of the presentinvention has been presented for purposes of illustration anddescription. It is not intended to be exhaustive or to limit theinvention to the precise forms disclosed. Many variations andmodifications of the embodiments described herein will be apparent toone of ordinary skill in the art in light of the above disclosure. Thescope of the invention is to be defined only by the claims appendedhereto, and by their equivalents.

Further, in describing representative embodiments of the presentinvention, the specification may have presented the method and/orprocess of the present invention as a particular sequence of steps.However, to the extent that the method or process does not rely on theparticular order of steps set forth herein, the method or process shouldnot be limited to the particular sequence of steps described. As one ofordinary skill in the art would appreciate, other sequences of steps maybe possible. Therefore, the particular order of the steps set forth inthe specification should not be construed as limitations on the claims.In addition, the claims directed to the method and/or process of thepresent invention should not be limited to the performance of theirsteps in the order written, and one skilled in the art can readilyappreciate that the sequences may be varied and still remain within thespirit and scope of the present invention.

What is claimed is:
 1. A method comprising: receiving an item ofcontent; parsing the item of content with a microprocessor to generate alinguistic annotated item of content with language associations;retrieving from a term selection rules repository stored upon a memoryat least a rule of a plurality of rules; applying with themicroprocessor the at least a rule of the plurality of rules toestablish a set of candidate sentiment carrying terms within thelinguistic annotated item of content; querying the set of candidatesentiment carrying terms against a target-domain sentiment lexicon togenerate a set of sentiment labeled terms; and applying to thelinguistic annotated item of content a set of sentiment labeling rulesestablished in dependence of at least the set of sentiment labeled termsto generate a sentiment label for the item of content.
 2. The methodaccording to claim 1 wherein, the language associations are at least oneof parts of speech, phrasal elements, and grammatical relationsassociated with terms that form a predetermined portion of the item ofcontent.
 3. The method according to claim 1 wherein, each sentimentlabeled term is associated with at least one of a sentiment label and asentiment intensity.
 4. The method according to claim 3 wherein, the atleast one of the sentiment label and the sentiment intensity areemployed in the application to the linguistic annotated item of contentof the set of sentiment labeling rules.
 5. A method comprising: a)receiving an item of content; b) receiving upon a microprocessor anindication of a predetermined portion of the item of content to analyze;c) establishing with the microprocessor a plurality of positivesentiment terms and a plurality of negative sentiment terms; d) parsingwith the microprocessor the predetermined portion of the item of contentto count occurrences of a positive sentiment term of the plurality ofpositive sentiment terms to establish a positive sentiment count; e)parsing with the microprocessor the predetermined portion of the item ofcontent to count occurrences of a negative sentiment term of theplurality of negative sentiment terms to establish a negative sentimentcount; and f) determining with the microprocessor a sentiment label toassociate with the item of content in dependence upon at least one ofthe occurrences of the positive sentiment term and occurrences of thenegative sentiment term.
 6. The method according to claim 5 wherein,each positive sentiment term of the plurality of positive sentimentterms has an associated positive intensity level; each negativesentiment term of the plurality of negative sentiment terms has anassociated negative intensity level.
 7. The method according to claim 6wherein, counting occurrences of the positive sentiment terms of theplurality of positive sentiment terms is achieved by: determining anumber of occurrences for each positive sentiment term; multiplying thenumber of occurrences for each positive sentiment term by its respectiveintensity level to generate a weighted occurrence count; summing theresulting weighting occurrence counts for the plurality of positivesentiment counts to generate the positive sentiment count; and countingoccurrences of the negative sentiment terms of the plurality of negativesentiment terms is achieved by: determining a number of occurrences foreach negative sentiment term; multiplying the number of occurrences foreach negative sentiment term by its respective intensity level togenerate a weighted occurrence count; summing the resulting weightingoccurrence counts for the plurality of negative sentiment counts togenerate the negative sentiment count.
 8. The method of claim 5 furthercomprising; establishing a number of predetermined portions of the itemof content in step (a) and associating with each predetermined portionof the item of content a portion weighting; steps (b) to (e) arerepeated for a number of predetermined portions of the item of content;and step (f) now comprises multiplying for each predetermined portion ofthe item of content the positive and negative sentiment counts by therespective portion weighting for that predetermined portion of the itemof content to generate portion weighted positive and negative sentimentcounts respectively and summing the results for all predeterminedportions of the item of content.
 9. The method according to claim 5further comprising; determining with the microprocessor a domainassociated with the item of content in step (a); and selecting with themicroprocessor a sentiment lexicon of a plurality of sentiment lexicons,the selection made in dependence upon at least the domain.
 10. Themethod according to claim 5 wherein, determining the sentiment label isat least one of: also dependent upon the imbalance between the counts ofoccurrences of the positive sentiment term and negative sentiment term;and selecting a sentiment label that is not one of either the positivesentiment term or negative sentiment term used in establishing theoccurrences.
 11. The method according to claim 5 wherein, generating thesentiment label is achieved in dependence upon at least one thedifference, the sum, the ratio of the occurrences of the positivesentiment term and occurrences of the negative sentiment term, thepositive sentiment term, and the negative sentiment term.
 12. The methodaccording to claim 5 wherein, generating a psychological tonequalification in dependence upon at least one the difference, the sum,the ratio of the occurrences of the positive sentiment term andoccurrences of the negative sentiment term, the positive sentiment term,and the negative sentiment term.
 13. The method of claim 5 furthercomprising; repeating step (d) for each positive sentiment term of theplurality of positive sentiment terms and each negative sentiment termof the plurality of negative sentiment terms; and step (f) now comprisessumming the results for all of the plurality of positive sentiment termsstep (f) now comprises with the microprocessor the sentiment label toassociate with the item of content in dependence upon at least one ofthe occurrences of all positive sentiment terms of the plurality ofpositive sentiment terms and occurrences of all negative sentiment termsof the plurality of the negative sentiment terms.
 14. The methodaccording to claim 11 further comprising; generating a psychologicaltone qualification in dependence upon at least one of the distributionof occurrences of all positive sentiment terms of the plurality ofpositive sentiment terms and the distribution of occurrences of allnegative sentiment terms of the plurality of the negative sentimentterms.
 15. The method according to claim 5 further comprising;determining with the microprocessor a domain associated with the item ofcontent in step (a); and determining a sentiment to associate to an itemof content, the determination being in dependence upon at least thedomain and the sentiment label.
 16. A method comprising: receiving withan item of content; processing with a microprocessor the item of contentto determine occurrences of content sentiment-carrying terms; displayingto a user the sentiment labels of content sentiment-carrying termswithin the item of content; and presenting to the user any sentimentintensity variation based on matching at least one of a predeterminedsentence and a phrasal syntactic structure of the document with arepository of syntactic structure patterns.
 17. The method according toclaim 16 wherein, the sentiment intensity variation is at least one ofan increase, a decrease, neutralization and a reversal.
 18. The methodof claim 16 wherein, describing any sentiment intensity variation isbased upon matching the sentiment of at least two adjacentsentiment-evaluated sentences with the repository of syntactic structurepatterns.
 19. The method of claim 16 further comprising, allowing theuser to select at least one of the sentiment carrying terms, sentencesand rhetorical structures to access an explanation relating to how thederived sentiment label is associated with the clicked entity.
 20. Amethod comprising: a) receiving a plurality of items of content; b)identifying with a microprocessor within the plurality of items ofcontent at least a core multi-item concept of a plurality of coremulti-item concepts, each core multi-item concept relating to a conceptcontained at least within a predetermined portion of the plurality ofitems of concept; c) selecting a core multi-item concept from theplurality of core multi-item concepts; and d) establishing with themicroprocessor a sentiment relating to the core multi-item concept forthe plurality of items of content.
 21. The method according to claim 20wherein, the sentiment relating to the core multi-item concept for theplurality of content is established by at least one of: e) determining acount based sentiment for the core multi-item concept for each item ofcontent of the plurality of items of content; and establishing thesentiment in dependence upon at least the plurality of document countbased sentiment; and f) determining a context count based sentiment byidentifying each instance of the core multi-item concept within theplurality of items of content.
 22. The method according to claim 20further comprising: repeating steps (c) and (d) for a predeterminedsubset of the plurality of multi-item concepts; and presenting at leastone of the predetermined subset of the plurality of multi-item conceptsto the user together with its associated sentiment.
 23. The methodaccording to claim 20 further comprising: e) receiving a secondplurality of items of content; f) repeating steps (c) and (d) for thesame core multi-item concept; g) presenting to a user at least one of:the original sentiment and a variance established in dependence upon atleast the original sentiment and the new sentiment.